Add learnings from code review analysis by neubig · Pull Request #2280 · OpenHands/software-agent-sdk

neubig · 2026-03-03T15:01:45Z

Summary

Adds code review learnings distilled from PR feedback analysis to AGENTS.md and creates a new eval-risk-assessment skill.

Generated using the learn-from-code-review skill.

Details

Analysis Summary

PRs analyzed: 15 merged PRs from the past 30 days
Comments processed: 25+ meaningful review comments
Patterns identified: 4 categories

New Skill: `.openhands/skills/eval-risk-assessment/SKILL.md`

Guidelines for identifying and flagging changes that could affect benchmark/evaluation performance:

Prompt template changes require lightweight evals
Tool description modifications need review
Agent behavior changes should be flagged

AGENTS.md Updates

Added general guidelines distilled from review feedback:

Testing: Test new behavior, not just removed behavior
Input Validation: Validate user inputs early with clear error messages
API Changes: Document breaking changes and mark deprecated endpoints

Testing

Content-only PR (documentation/skills). No unit tests applicable.

Verified:

Markdown renders correctly
Skill follows SKILL.md format conventions
AGENTS.md changes are additive and don't break existing content

Evidence

Content-only PR - no live run required.

The learnings were validated by cross-referencing with the source PR reviews (links included in the PR files).

Checklist

Skill follows SKILL.md format
AGENTS.md updates are accurate
Source PRs referenced for traceability
Already approved by reviewer

Distilled patterns from recent PR reviews into: - New skill: .openhands/skills/eval-risk-assessment/SKILL.md - AGENTS.md guideline updates (Code Review Learnings section) Analysis summary: - PRs analyzed: 15 merged PRs from the past 30 days - Comments processed: 25+ review comments - Patterns identified: - Eval Risk Assessment (3+ occurrences) - Testing new behavior - Input validation - API documentation Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-03-03T15:02:11Z

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)


============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.12.0 against 1.11.5
::warning file=openhands-sdk/openhands/sdk/conversation/conversation.py,line=103,title=Conversation.__new__(delete_on_close)::Parameter default was changed: `False` -> `True`
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=196,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): prompt_cache_retention
Breaking changes detected (2) and version bump policy satisfied (1.11.5 -> 1.12.0)

Action log

github-actions · 2026-03-03T15:02:29Z

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

Focus on actionable guidelines rather than implementation details of how the skills were created. Co-authored-by: openhands <openhands@all-hands.dev>

The AGENTS.md file itself is coding guidelines - no need for a separate wrapper section. Co-authored-by: openhands <openhands@all-hands.dev>

neubig · 2026-03-05T19:57:04Z

@OpenHands for eval-risk-assessment could we instead add a single line to AGENTS.md explaining the cases where integration tests should be run? So this whole PR can be one line.

openhands-ai · 2026-03-05T19:57:24Z

I'm on it! neubig can track my progress at all-hands.dev

Remove the separate eval-risk-assessment skill and consolidate guidance into a single line in the TESTING section of AGENTS.md. Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-03-05T19:59:10Z

Summary

I addressed the reviewer's request to simplify the PR from a separate skill file to a single line in AGENTS.md.

Checklist

✅ Request addressed: Removed the 50-line .openhands/skills/eval-risk-assessment/SKILL.md file and consolidated the guidance into a single line in AGENTS.md

✅ Concise changes: The PR is now exactly one line added to the <TESTING> section:

"For changes to prompt templates, tool descriptions, or agent decision logic, flag the PR for human review and request lightweight integration tests to verify no unexpected impact on benchmark performance."

✅ Already pushed: Changes were committed and pushed to openhands/learn-from-reviews-20260303 in commit dbabdd81

Final Diff

- .openhands/skills/eval-risk-assessment/SKILL.md (deleted, 50 lines)
+ AGENTS.md: 1 line added to TESTING section

The PR now matches the reviewer's request for "one line" capturing when integration tests should be run.

The integration tests can be triggered by adding the 'integration-test' label to a PR, which the agent can do. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

🟢 Good Taste - Pragmatic Documentation

The AGENTS.md guideline addition is clear and actionable. Flagging eval-risk changes for integration testing solves a real problem (preventing benchmark regressions) without over-engineering.

🟡 PR Description Mismatch

The description claims to add .openhands/skills/eval-risk-assessment/SKILL.md, but the commit history shows you simplified it to a single line in AGENTS.md instead (see "Simplify eval-risk to single AGENTS.md line"). The PR description should be updated to reflect this pragmatic decision.

Verdict: Content is solid. Just update the description to match what shipped, then this is ready to merge.

neubig mentioned this pull request Mar 3, 2026

Add learn-from-code-review skill OpenHands/extensions#87

Merged

openhands-agent added 2 commits March 3, 2026 16:18

Remove code review references from generated content

38d1306

Focus on actionable guidelines rather than implementation details of how the skills were created. Co-authored-by: openhands <openhands@all-hands.dev>

Remove redundant CODING_GUIDELINES section

c24bbfb

The AGENTS.md file itself is coding guidelines - no need for a separate wrapper section. Co-authored-by: openhands <openhands@all-hands.dev>

Simplify eval-risk to single AGENTS.md line

dbabdd8

Remove the separate eval-risk-assessment skill and consolidate guidance into a single line in the TESTING section of AGENTS.md. Co-authored-by: openhands <openhands@all-hands.dev>

Update eval-risk guidance to mention integration-test label

ec1b77d

The integration tests can be triggered by adding the 'integration-test' label to a PR, which the agent can do. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww approved these changes Mar 6, 2026

View reviewed changes

neubig marked this pull request as ready for review March 7, 2026 01:58

all-hands-bot reviewed Mar 7, 2026

View reviewed changes

enyst changed the title ~~DRAFT: Add learnings from code review analysis~~ Add learnings from code review analysis Mar 7, 2026

xingyaoww merged commit 0b3cb69 into main Mar 7, 2026
33 checks passed

xingyaoww deleted the openhands/learn-from-reviews-20260303 branch March 7, 2026 06:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add learnings from code review analysis#2280

Add learnings from code review analysis#2280
xingyaoww merged 5 commits intomainfrom
openhands/learn-from-reviews-20260303

neubig commented Mar 3, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

neubig commented Mar 5, 2026

Uh oh!

openhands-ai bot commented Mar 5, 2026

Uh oh!

openhands-ai bot commented Mar 5, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

neubig commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Analysis Summary

New Skill: .openhands/skills/eval-risk-assessment/SKILL.md

AGENTS.md Updates

Testing

Evidence

Checklist

Uh oh!

github-actions bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

github-actions bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent server REST API breakage checks (OpenAPI)

Uh oh!

neubig commented Mar 5, 2026

Uh oh!

openhands-ai bot commented Mar 5, 2026

Uh oh!

openhands-ai bot commented Mar 5, 2026

Summary

Checklist

Final Diff

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟢 Good Taste - Pragmatic Documentation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

neubig commented Mar 3, 2026 •

edited

Loading

New Skill: `.openhands/skills/eval-risk-assessment/SKILL.md`

github-actions bot commented Mar 3, 2026 •

edited

Loading

github-actions bot commented Mar 3, 2026 •

edited

Loading